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Abstract —Abstraction is a successful technique in software 
verification, and interpolation on infeasible error paths is a 
successful approach to automatically detect the right level of 
abstraction in counterexample-guided abstraction refinement. 
Because the interpolants have a significant influence on the 
quality of the abstraction, and thus, the effectiveness of the veri¬ 
fication, an algorithm for deriving the best possible Interpolants 
is desirable. We present an analysis-independent technique that 
makes it possible to extract several alternative sequences of 
interpolants from one given infeasible error path, if there are 
several reasons for infeasibility in the error path. We take as 
input the given infeasible error path and apply a sUcing technique 
to obtain a set of error paths that are more abstract than 
the original error path but still infeasible, each for a different 
reason. The (more abstract) constraints of the new paths can be 
passed to a standard interpolation engine, in order to obtain 
a set of interpolant sequences, one for each new path. The 
analysis can then choose from this set of interpolant sequences 
and select the most appropriate, instead of being bound to 
the single interpolant sequence that the interpolation engine 
would normally return. For example, we can select based on 
domain types of variables in the Interpolants, prefer to avoid 
loop counters, or compare with templates for potential loop 
invariants, and thus control what kind of information occurs 
in the abstraction of the program. We Implemented the new 
algorithm in the open-source verification framework CPAchecker 
and show that our proof-technlque-lndependent approach yields 
a significant Improvement of the effectiveness and efficiency of 
the verification process. 

I. Introduction 

In the field of automatic software verification, abstrac¬ 
tion is a well-understood and widely-used technique, en¬ 
abling the successful verification of real-world, industrial 
programs (cf. Abstraction makes it possible to 

omit certain aspects of the concrete semantics that are not 
necessary to prove or disprove the program’s correctness. This 
may lead to a massive reduction of a program’s state space, 
such that verification becomes feasible within reasonable time 
and resource limits. For example. Slam 0 uses predicate 
abstraction 0 for creating an abstract model of the software. 
One of the current research directions is to invent techniques 
to automatically find suitable abstractions. An ideal model is 
abstract enough to avoid state-space explosion and still contains 
enough detail to verify the property. Counterexample-guided 
abstraction refinement (CEGAR) | |T4) is an automatic technique 
that starts with a very coarse abstraction and iteratively refines 
an abstract model using infeasible error paths (witnesses of 
property violations). If the analysis does not find an error path 
in the abstract model, the analysis terminates and reports the 


verdict TRUE (property holds). Because the abstract model 
over-approximates the program, the verdict applies for the 
actual program. If the analysis finds an error path, the path is 
checked for feasibility. If the found error path does not contain a 
contradiction and the error is indeed reachable according to the 
concrete program semantics, then the error path is feasible and 
a real error was found. The analysis terminates and reports the 
verdict FALSE (program violates property). If, however, the error 
path is actually infeasible, then a “spurious counterexample” 
was found, and the property violation is due to a too coarse 
abstract model. The (contradicting) constraints of the infeasible 
error path can then be passed to an interpolation engine, and 
the obtained interpolants identify information that is needed for 
refining the current abstraction, such that the same infeasible 
error path is excluded in subsequent CEGAR iterations. After 
refinement, the analysis proceeds with rebuilding a refined 
abstract model in the next CEGAR iteration. Several successful 
software verifiers (e.g.. Slam Q, Blast Q, CPAchecker Q, 
Ufo Q) make use of the CEGAR loop, which is illustrated in 
Eigure 

Craig interpolation is a technique that yields for two 
contradicting formulas an interpolant formula that contains less 
information than the first formula, but is still expressive enough 
to contradict the second formula. This can be extended to a 
sequence of formulas. In software verification, interpolation was 
first used for the domain of predicate abstraction (T8| , and later 
for value-analysis domains GD- Independent of the analysis 
domain, interpolants for path constraints of infeasible error 
paths can be used to refine abstract models and to eliminate 
the infeasible error paths in subsequent CEGAR iterations. 
In this context, it is important to point out that the choice 
of interpolants is crucial for the performance of the analysis. 
Eigure gives an example: In this program, the analysis will 
typically find the shown error path, which is infeasible for 
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Eig. 1: Example of the CEGAR loop, using a single error path 
for interpolation 
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extern int f(int x); 

int main() ( 
int b = 0; 
int i = 0; 
while(l) { 

if (i > 9) break ; 
f(i+ + ); 
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if(b != 0) { 
if(i != 10) { 
assert(0); 
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Fig. 2: Input program, with infeasible error path, and a “good” 
and a “bad” interpolant sequence 



Fig. 3: Example of the CEGAR loop, using a set of paths for 
interpolation 


two different reasons: both the value of i and the value of b 
can be used to find a contradiction. In general, it is now 
beneficial for the verifier to track the value of the boolean 
variable b, and not to track the value of the loop-counter 
variable i, because the latter has many more possible values, 
and tracking it would usually lead to an expensive unrolling of 
the loop. Instead, if only variable b is tracked, the verifier 
can conclude the safety of the program without unrolling 
the loop. Thus, we would like to get from the interpolation 
engine the left shown interpolant sequence (only with boolean 
variable) and not the right interpolant sequence (with loop- 
counter). However, interpolation engines typically do not allow 
to guide the interpolation process towards a “good”, or away 
from a “bad”, interpolant sequence. The interpolation engines 
inherently cannot do a better job here: they do not have access 
to information such as whether a specific variable is a loop 
counter and should be avoided in the interpolant. Instead, which 
interpolant is returned depends solely on the internal algorithms 
of the interpolation engine. This is especially true if the model 
checker in use does not provide its own implementation of an 
interpolation engine but rather makes calls to a library, e.g., a 
Satisfiability Modulo Theories (SMT) solver, which normally 
cannot be controlled on such a fine-grained level. In this case, 
the model checker is stuck to what the interpolation engine 
returns, be it good or bad for the verification process. 

Therefore, we present an approach to guide the interpolation 
engine to produce interpolants that we would like to get, without 


changing the interpolation engine. Eor this, we extract from an 
infeasible error path a set of infeasible sliced paths stemming all 
from the same infeasible error path. Each of these sliced paths 
can be used for interpolation, yielding different interpolant 
sequences that are all expressive enough to eliminate the 
original infeasible error path. As depicted in Figure our 
approach fits well into CEGAR, because only the refinement 
component needs customization, and the new approach remains 
compatible with off-the-shelf interpolation engines. 

Contributions. This paper makes the following key contribu¬ 
tions: 

• we introduce a domain- and analysis-independent method 
to extract infeasible sliced paths from infeasible error 
paths, 

• we prove that interpolants for such a sliced path are also 
interpolants for the original infeasible error path, 

• we explain that —and how— it is possible to obtain better 
interpolants (in comparison to the standard approach) 
from a set of infeasible sliced paths, and that refinement 
selection plays a significant role in CEGAR, 

• we implement the presented concepts in the open-source 
framework for software verification CPAchecker, and 

• we show by experimental results that the novel approach 
to obtain better interpolants significantly improves the 
verification effectiveness and efficiency. 


Related Work. The desire to control what interpolants an 
interpolation engine produces, and trying to make the verifica¬ 
tion process more efficient by finding good interpolants, is not 
new. Our goal is to contribute a technique that is independent 
from the abstract domain that a program analysis uses, and 
independent from specific properties of interpolation engines. 

The first work in this direction was suggesting to make 
the interpolation configurable such that the user has a choice 
between strong and weak interpolants, by controlling the 
interpolant strength m- This approach is unfortunately not 
implemented in the standard interpolation engines; it requires to 
rewrite the algorithm that extracts interpolants from resolution 
proofs. The technique of interpolation abstractions |22|, a 
generalization of term abstraction 0, can be used to guide 
solvers to pick good interpolants. This is achieved by extending 
the concrete interpolation problem by so called templates (e.g., 
terms, formulas, uninterpreted functions with free variables) to 
obtain a more abstract interpolation problem. An interpolant 
for the abstract interpolation problem is also a solution to the 
concrete interpolation problem. Because these interpolation 
abstractions form a lattice, suitable interpolants can be chosen 
using a cost function. Our approach is independent from the 
abstract domain and interpolation engine, and does not rely 
on SMT solving. Eor example, our technique is applicable to 
value and octagon domains. 

Path slicing p0[ is a technique that was introduced to reduce 
the burden of the interpolation engine: Before the constraints 
of the path are given to the interpolation engine, the constraints 
are weakened by removing facts that are not important for the 
infeasibility of the error path, i.e., a more abstract error path 
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is constructed. We also make the error path more abstract, but 
in different directions to obtain different interpolant sequences, 
from which we can choose the ones that yield the best abstract 
model. While path slicing is interested in reducing the run time 
of the interpolation engine (by omitting some facts), we are 
interested in reducing the run time of the verification engine 
(by spending more time on interpolation but creating a better 
abstract model). 

II. Background 

Our approach is based on several existing concepts, and in 
this section we remind the reader of some basic definitions 
and our previous work in this field di- 

programs, Control-Flow Automata, States, Paths, Preci¬ 
sions. We restrict the presentation to a simple imperative pro¬ 
gramming language, where all operations are either assignments 
or assume operations, and all variables range over integers[^ 
A program is represented by a control-flow automaton (CFA). 
A CFA A = {L,Iq,G) consists of a set L of program 
locations, which model the program counter, an initial program 
location Iq G L, which models the program entry, and a set 
G C L X Ops X L of control-flow edges, which model the 
operations that are executed when control flows from one 
program location to the next. The set of program variables that 
occur in operations from Ops is denoted by A. A verification 
problem P = {A, 4) consists of a CFA A, representing the 
program, and a target program location 4 G L, which represents 
the specification, i.e., “the program must not reach location 4 ”- 

A concrete data state of a program is a variable assignment 
cd : A —>■ Z, which assigns to each program variable an integer 
value; the set of integer values is denoted as Z. A concrete 
state of a program is a pair {I, cd), where I G L is a program 
location and cd is a concrete data state. The set of all concrete 
states of a program is denoted by C, a subset r C C is called 
region. Each edge g G G defines a labeled transition relation 
C C X {5} X C. The complete transition relation —> is the 
union over all control-flow edges: —> = We write 

c-^c' if (c,g, c') G —and c—^c' if there exists a g with c-^c'. 

An abstract data state represents a region of concrete data 
states, formally defined as abstract variable assignment. An 
abstract variable assignment is a partial function v : X -og Z 
or _L, where v maps variables in its definition range to integer 
values, and _L is used to represent no variable assignment (i.e., 
no value is possible, similar to the predicate false in logic). 
The special abstract variable assignment T = {} does not 
map any variable to a value and is used as initial abstract 
variable assignment in a program analysis. Variables that 
do not occur in the definition range of an abstract variable 
assignment are either omitted by purpose for abstraction in the 
analysis, or the analysis is not able to determine a concrete 
value (e.g., resulting from an uninitialized variable declaration 
or from an external function call). For two partial functions / 

’Our implementation is based on CPAchecker, which operates on C pro¬ 
grams; non-recursive function calls are supported. 


and /', we write f{x) = y for the predicate {x, y) G /, and 
fix) = f'{x) for the predicate 3c : {fix) = c) A {fix) = c). 
We denote the definition range for a partial function / as 
def(/) = {x\3y fix) = y}, and the restriction of a partial 
function / to a new definition range Y as fiy = / H (A x Z). 
An abstract variable assignment v represents the set |c] of all 
concrete data states cd for which v is valid, formally: |_L] = {} 
and for all c ^ _L, |t;] = {cd | Va; S A : vix) = cdix)}. The 
abstract variable assignment _L is called contradicting. The 
implication for abstract variable assignments is defined as 
follows: V implies v' (written v ^ v') if v = _L, or for all 
variables x G def(c') we have vix) = v'ix). The conjunction 
for abstract variable assignments v and v' is defined as: 

( _L if n = _L or c' = _L 

vAv' = < or 3a; S def(c) n def(t!') : vix) v'ix) 

y v\Jv' otherwise 

The semantics of an operation op G Ops is defined by the 
strongest-post operator SPop(-): given an abstract variable 
assignment v, SPop(t;) represents the set of concrete data 
states that are reachable from the concrete data states in the 
set |c] by executing op. Formally, given an abstract variable 
assignment v and an assignment operation x := exp, we have 
SP x:—expi’^) — f|X\{x} ^ '^x with 

{ {(cc, c)} if c S Z is the result of the arithmetic 
evaluation of expression exp/y 
{} otherwise (if exp/y cannot be evaluated) 

where exp/y denotes the interpretation of expression exp 
for the abstract variable assignment v. Given an abstract 
variable assignment v and an assume operation [p ], we 
have SP[p](c) = _L if n = _L or the predicate p/y 
is unsatisflable, or we have SP[p](n) = v A Vp with 
Vp = {(a;, c) e (A \ def('(;) X Z) | => (a: = c)}, and 

P/v=P^ A y = viy). 

yGdef{v) 

A path cr is a sequence ((op^, 4), ■ ■ ■, (op„An)) of 
pairs of an operation and a location. The path a is 
called program path if for every i with 1 < i < n there 
exists a CFA edge p = (4-i, opj, 4) and 4 is the ini¬ 
tial program location, i.e., a represents a syntactic walk 
through the CFA. The result of appending the pair 
(op„,Z„) to a path a = ((opi, 4 ), • ■ •, (op„^, 4 n)) is de¬ 
fined as cr A (op„, In) = ((opi, 4), ■ • ■ , (oPm, Im), (op„, A))- 
Every path a — ((op^, 4), • ■ •, (op„, 4i)) defines a con¬ 
straint sequence Ao — {oPiT ■ ■ ^ oPn)- The conjunc¬ 
tion 7 A 7' of two constraint sequences 7 = {op^, ..., op„) 
and 7' = {op'i, ..., op 4 i) is defined as their concatenation, 
i.e., 7 A 7' = (opi,..., opy^, op[, ..., op'yf), the implication 
of 7 and 7' (denoted by 7 7') as the implication of 

their strongest-post assignments SP.y(T) SP.y'(T), and 
7 is contradicting if SP.y(T) = _L. The semantics of a path 
cr = ((opi, 4 ), • ■ •, (op„, 4 )) is defined as the successive 
application of the strongest-post operator to each operation 
of the corresponding constraint sequence 70 -: SP.y^(n) = 
SPop^ (... SPopj iv)...). The set of concrete program states 
that result from running a program path a is represented 
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Algorithm 1 


CEGAR(D, eo, TTo), cf. |11| 


Algorithm 2 Refine(tT) 


Input: a CPA with dynamic precision adjustment D and 

an initial abstract state eo £ E with precision vro G (i —>■ 2*^) 
Output: verification result TRUE (property holds) or FALSE 
Variables: a set reached of elements of i? x (i —>• 2^^), 
a set waitlist of elements of i? x (L —^ 2 *^), and 
an error path cr = ((op^, h ),..., (op„, („)) 

1: reached := {(eoj-To)}; waitlist ;= {(eo,7ro)}; tt := tto 
2 : while true do 

3: (reached, waitlist) := CPA(D, reached, waitlist) 

4: if waitlist = {} then 

5: return true 

6: else 

7: a := ExtractErrorPath(reached) 

8 : if lsFeasible(f 7 ) then // error path is feasible: report bug 

9: return false 

10 : else // error path is infeasible: refine and restart 

11: TT := TT U Refine((j) 

12: reached := {(eo,7r)}; waitlist := {(eo,7r)} 


by the pair ((„, SP^^ (T)), where T is the initial abstract 
variable assignment. A path a feasible if SP-),^(T) is not 
contradicting, i.e., SP.y^(T) _L. A concrete state {ln,cdn) 
is reachable, denoted by ((„,cd„) G Reach, if there exists 
a feasible program path a = {{opi, f ),..., (op„, In)) with 
cdn G |SP 7 ^(T)]. a location I is reachable if there exists a 
concrete data state cd such that {I, cd) is reachable. A program 
is safe (the specification is satisfied) if 4 is not reachable. 
A path cr = {{opijh),... ,{op„,le)), which ends in Ig, is 
called error path. 

The precision is a fnnction tt : L —>^ 2^^, where If depends 
on the abstract domain used by the analysis. It assigns to each 
program location some analysis-dependent information that 
defines the level of abstraction of the analysis. For example, 
when using predicate abstraction, the set 11 is a set of predicates 
over program variables. When using a value domain, the set 11 
is the set X of program variables, and a precision defines 
which program variables should be tracked by the analysis at 
which program location. 

Counterexample-Guided Abstraction Refinement 
(CEGAR). CEGAR is a technique for automatic iterative 
refinement of an abstract model HD- CEGAR is based 
on three concepts: ( 1 ) a precision, which determines the 
current level of abstraction, ( 2 ) a. feasibility check, deciding 
if an error path (the counterexample) is feasible, and (3) a 
refinement procedure, which takes as input an infeasible error 
path and extracts a precision to refine the abstract model 
such that the infeasible error path is eliminated from further 
exploration. Algorithm [T] shows an outline of a generic and 
simple CEGAR algorithm. It uses the CPA algorithm iHD 
for program analysis with dynamic precision adjustment and 
an abstract domain D that is formalized as a configurable 
program analysis (CPA) with dynamic precision adjustment. 
The CPA uses a set E of abstract states and a set L — 2^ 
of precisions. The analysis algorithm computes the sets 
reached and waitlist, which represent the current reachable 


Input: an infeasible error path a = ((opj^, h ),..., (op^, („)) 

Output: a precision tt 

Variables: a constraint sequence T 

1 : r :=0 

2: 7r(/) := {}, for all program locations I 
3: for i := 1 to n — 1 do 
4: 7+ := (opi+i,...,op„) 

5: r := lnterpolate(r A 0 Pi,'y'^) // inductive interpolation 

6: 7r(li) := ExtractPrecision(r) // create precision based on T 

7: return tt 


abstract states with precisions and the frontier, respectively. 
The analysis algorithm is run first with ttq as coarse initial 
precision (usually tto{1) = {} for all I G L). If all program 
states have been exhaustively checked and no error was 
reached, indicated by an empty waitlist, then the CEGAR 
algorithm terminates and reports TRUE (program is safe). If 
the analysis algorithm finds an error in the abstract state 
space, then it stops and returns the yet incomplete sets 
reached and waitlist. Now the corresponding abstract error 
path is extracted from the set reached, using the procedure 
ExtractErrorPath, and passed to the procedure IsFeasible for 
the feasibility check. If the abstract error path is feasible, 
meaning there exists a corresponding concrete error path, 
then this error path represents a violation of the specification 
and the algorithm terminates, reporting FALSE. If the error 
path is infeasible, i.e., is not corresponding to a concrete 
program path, then the precision was too coarse and needs to 
be refined. The refinement step is performed by the procedure 
Refine (cf. Alg. ^ which returns a precision tt that makes the 
analysis strong enough to exclude the infeasible error path 
from future state-space explorations. This returned precision is 
used to extend the current precision of the CEGAR algorithm, 
which starts its next iteration, delegating to the analysis 
algorithm the re-computation of the sets reached and waitlist 
based on this refined precision. CEGAR is often used with 
lazy abstraction p9| to avoid re-discovering the whole state 
space after each refinement, but instead removing only those 
parts of reached and waitlist that need to be re-analyzed with 
the new precision. 

Interpolation for Constraint Sequences. An interpolant for 
two constraint sequences 7^ and 7+, such that 7“ A 7+ is 
contradicting, is a constraint sequence E for which 1 ) the 
implication 7^ T holds, 2) the conjunction E A 7“*’ is 
contradicting, and 3) the interpolant T contains In Its constraints 
only variables that occur in both 7 ^ and 7 + 0 - 

In the following, we will introduce our novel approach, which 
extends the procedure Refine to not only perform Interpolation 
on a single infeasible error path, and returning an arbitrary 
interpolant, but instead, interpolate a set of infeasible sliced 
prefixes stemming from this single infeasible error path, and 
offering a set of interpolants from which the most suitable 
precision may be chosen. 
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III. Sliced Prefixes 

Our novel technique can be used to extend any approach 
that is based on CEGAR. Slice-based refinement selection 
extracts from a given infeasible error path not only one single 
interpolation problem for obtaining a refined precision, but a set 
of (more abstract, sliced) infeasible error paths and thus a set of 
interpolation problems, from which the refined precision can be 
derived. The interpolation problems for the extracted paths are 
given, one by one, to the interpolation engine, in order to derive 
interpolants for each path individually. Hence, the abstraction 
refinement of the analysis is no longer dependent on what the 
Interpolation engine produces, but instead it is free to choose 
from a set of interpolants the one it finds most suitable. The 
move from solving a single interpolation problem to solving 
multiple interpolation problems to enable refinement selection, 
and in the process transforming the refinement selection into 
an optimization problem, is a key insight of our approach. 

Infeasible Sliced Prefixes. A CEGAR-based analysis usually 
encounters an infeasible error path due to the coarse precision 
that it starts with. This occurs when there exists a path to 
the error location that contains as least one assume operation 
that is feasible when the reachability algorithm computes 
abstract successors based on the current precision, but is 
actually contradicting under the concrete semantics of the 
program. Every infeasible error path contains at least one 
such contradicting assume operation, but often, there exist 
several independent contradicting assume operations in an 
infeasible error path, which leads to the notion of infeasible 
sliced prefixes: A path = {{opi, h),..., {op^, 1^)) is a 
sliced prefix for a program path a = {[op-^fii),..., {op^, In)} 
if w < n and for all 1 < i < w, we have fi.li = a.li and 
{fi.op^ = a.opi or {(j).op^ = [true] and ct.op^ is assume op)), 
i.e., a sliced prefix results from a path by omitting pairs of 
operations and locations from the end, and possibly replacing 
some assume operations by no-op operations. If a sliced prefix 
for (T is infeasible, then a is infeasible. 

Extracting Infeasible Sliced Prefixes from an Infeasible 
Error Path. Algorithm is capable of extracting from an 
infeasible error path all its infeasible sliced prefixes, i.e., all 
paths from the initial program operation to a contradicting 
assume operation. The algorithm iterates through the given 
infeasible error path a. It keeps incrementing a sliced path 
prefix CT/ that contains all operations from a that were seen 
so far, except the contradicting assume operations, which are 
replaced by no-op operations. Thus, cr/ always stays feasible. 
Eor every element {op, 1) from the original path cr (iterating 
in order from the first to the last pair), we check whether it 
contradicts tj/, which is the case if the result of the strongest- 
post operator for the path a f f\{op,l) is contradicting (denoted 
by _L). If so, the algorithm has found a new infeasible sliced 
prefix. In any case, it continues with the next element after 
extending ct/ (either by the current operation, or by a no-op 
operation if the current operation is contradicting). When the 
algorithm terminates, which is guaranteed because a is finite, 
the set E contains all infeasible sliced prefixes of a. There 


Algorithm 3 ExtractSlicedPrefixes((T) 

Input: an infeasible path a = ((oPi, h), ■ ■ ■, (op^, In)) 

Output: a non-empty set E = of infeasible sliced 

prefixes of cr 

Variables: a path crj that is always feasible 
1: E {}; af := {) 

2: for each {op, 1) £ a // iterate in order from (opj^, h) to (op„, In) 

do 

3: if SP^^.A(op,o(T) = _L then 

4: // add a/ A {op, 1) to the set of infeasible sliced prefixes 

5: E := E U {af A {op, 1)} 

6: af ~ af A {[true], 1) // append no-op 

7: else 

8: af := af A {op, 1) // append original pair 

9: return E 


is always at least one infeasible sliced prefix because cr is 
infeasible. 

Algorithm returns the set of all infeasible sliced prefixes. 
Each of these sliced prefixes has some interesting characteris¬ 
tics: (1) Each sliced prefix starts with the initial operation opi, 
and ends with an assume operation that contradicts the previous 
operations of the sliced prefix, i.e., SP^(T) = _L. (2) The i- 
th sliced prefix, excluding its (final and only) contradicting 
assume operation and location, is a prefix of the {i -f l)-st 
sliced prefix. (3) All sliced prefixes differ from a prefix of the 
original Infeasible error path cr only In their no-op operations. 

The visualizations in Fig.|^capture the details of this process. 
Figure 4a shows the original error path. Nodes represent 
program locations and edges represent operations between 
these locations (assignments to variables or assume operations 
over variables, the latter denoted with brackets). To allow 
easier distinction, program locations that are followed by 
assume operations are drawn as diamonds, while other program 
locations are drawn as squares. Contradicting assume operations 
are drawn with a filled background. The sequence of operations 
ends in the error state, denoted by If,. Figure |4b| depicts the 
cascade-like sliced prefixes that the algorithm encounters during 
its progress. Figure |4^ shows the three infeasible sliced prefixes 
that Alg. returns for this example. 

The refinement procedure can now use any of these sliced 
prefixes to create interpolation problems, and is not bound to 
a single sequence of interpolants for a single infeasible error 
path; a refinement selection from different precisions is now 
possible. The following proposition states that this is a valid 
refinement process. 


Proposition. Let a be an infeasible error path and be the i-th 
Infeasible sliced prefix for a that is extracted by Alg. then 
all interpolant sequences for are also interpolant sequences 
for a. 


Proof. Let a = ((op^, Zi),..., (op„, Z„)) and 

(f) = {{opf,li),. .. ,{opn,,L)). Let be the j-th 

interpolant of an interpolant sequence for i.e., for the 
two constraint sequences 7 ^^ = {opi,...,opj) and 

= {oPj+i, • • ■, opj, with I < j < w. Because is 
infeasible, the two constraint sequences 7 ^^ and 7 ^^ are 
contradicting, and therefore, E^i exists |11|. The interpolant 
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(b) cascade of sliced prefixes 

Fig. 4: From one infeasible error path to a set of infeasible sliced prefixes 


T^j is also an interpolant for 7 ^^ = {opi,...,opj) and 
{^P ■ ■ ■) e)^ the implication 

holds, (2) the conjunction T A 7 ^, is contradicting, and 
(3) the interpolant T^j contains only variables that occur in 
both 7 ~j and 7 ^^ . Consider that 7 “^ was created from 7 '^ 
by replacing some assume operations by no-op operations, 
and that 7 ^^ was created from 7 ^^ by replacing some assume 
operations by no-op operations and by removing the operations 
{op^_^_i ,..., op^) at the end. Thus, both 7 ^^ and 7 ^^ do not 
contain any additional constraints (except for no-op operations) 
than 7 “j and 7 ^^ , respectively. Because F^j is an interpolant 
for 7 “^ and 7 J, , we know that 7 “^ => F^j holds, and because 
7 “j can only be stronger than 7 “^, Claim (j^ follows. The 
conjunction F^j A 7 ^^ is contradicting, and can only 
be stronger than Thus, Claim (j^ holds. Because F^j 
references only variables that occur in both and 7 ^^, 
which do not contain more variables than 7” and 7'*', , resp.. 
Claim ^ holds. 


IV. Slice-Based Reeinement Selection 

As described earlier, extracting good precisions from the 
infeasible error paths is key to the CEGAR technique, and the 
choice of interpolants influences the quality of the precision, 
and thus, the effectiveness of the analysis algorithm. By using 
the results introduced in the previous section, the refinement 
procedure can now be improved by selecting a precision that 
is derived via interpolation from a selected sliced prefix. 

Algorithm shows our algorithm for slice-based refinement 
selection, which can be used as a replacement for Alg. 
in the CEGAR algorithm and chooses a suitable interpolant 
sequence during the refinement step. Eirst, this algorithm uses 
ExtractSlicedPrefixes to extract all infeasible sliced prefixes. 
Second, it computes interpolant sequences for all of them 
and stores them in the mapping t. Third, one sliced prefix 
is chosen by a heuristic (in function ChooseS I iced Prefix) and 
fourth, the returned precision is created from the interpolants 
for the chosen sliced prefix. The heuristic can decide based 
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Algorithm 4 Refine+(cr) 

Input: an infeasible error path a = li),..., (op^, !„)) 

Output: a precision n 

Variables: a constraint sequence F, 

a set E of infeasible sliced prefixes of a, 

a mapping t from infeasible sliced prefixes and program locations 
to interpolants 

1: E := ExtractSlicedPrefixes(CT) 

2: // compute interpolants for each location in each prefix 
3: for each (j>j = ((opj, h),{op^, L)) G E do 
4: r := 0 

5: for i := 1 to w — 1 do 

6 : ■■= {oPi+i,---,opJ 

7: r := lnterpolate(r A 0 Pi,'y^) // inductive interpolation 

8: t(0j,/i):=r 

9: // choose suitable sliced prefix 
10: // (based on the sliced prefixes and its interpolants) 

11: (pseiected ■= ChooseSlicedPrefix(t) 

12: // create precision based on chosen interpolants 
13: 7r(/):={}, for all program locations I 
14: for each (op, 1 ) G (j>seiscted do 
15: F i{^(j^selected'^ () 

16: 7r(Z) := ExtractPrecision(F) // create precision based on F 

17: returu vr 


on the information contained in the sliced prefixes as well as 
in the interpolants, e.g., which variables are referenced by the 
interpolants. 

Refinement-Selection Heuristics. We regard the selection of 
interpolants for refinement as an independent direction for 
further research, but present several ideas on how to select 
interpolants here. There are two obvious options for interpolant 
selection that do not depend on the actual interpolants. Using 
the interpolant sequence derived from the very first, i.e., the 
shortest, infeasible prefix may rule out many similar infeasible 
error paths. The downside of this choice is that the analysis 
has to track information very early, possibly blowing up 
the state-space and making the analysis less efficient. The 
other straight-forward option (also known as counterexample 
minimization 0) is to use the longest infeasible sliced 
prefix (containing the last contradicting assume operation) for 
computing an interpolant sequence. This may lead to a precision 
that is local to the error location and does not require refining 
large parts of the state space at the beginning of the error path. 
However, it may also lead to a larger number of refinements if 
many error paths with a common prefix exist. A more advanced 
strategy is to analyze the domain types 0 of the variables that 
are referenced in the interpolant sequence. Each interpolant 
sequence can be assigned a score that depends on the domain 
types of the variables in the interpolant sequence such that 
the score of the interpolant sequence is better if it references 
only ‘easy’ types of variables, e.g., boolean variables, and 
no integer variables or even loop counters. This allows to 
focus on variables that are inexpensive to track, avoid loop 
unrolling where possible, and keep the size of the abstract 
state space as small as possible. Furthermore, it is possible to 
estimate, by means of the use-def relation of the variables in 


the interpolants, how much of the already explored state-space 
has to be recomputed depending on which interpolant sequence 
is chosen. Based on that insight, we can identify the interpolant 
sequence that would ensure that only as little as possible from 
the state space needs to be re-explored. In addition to that, many 
different refinement heuristics are conceivable. For example, 
it would be possible to avoid sliced error paths that contain 
non-linear arithmetic if using predicate abstraction with an 
SMT solver for linear arithmetic. 

In general, any such heuristic can be used without changing 
the overall algorithm, but only the function ChooseSlicedPrefix 
in Alg.|^ needs to be replaced accordingly. Using a selection 
heuristic specifically developed for programs encoding an event- 
condition-action system improved the effectiveness of our tool 
CPAchecker in the RERS challenge 2014 and allowed it to 
obtain two gold and one bronze medals, as well as two special 
achievements!^ This shows that optimizing the CEGAR loop 
by using domain knowledge in the refinement step can be 
rewarding, and that our approach provides a possibility to do 
so easily. In the following, we present detailed results for the 
effectiveness of our approach for a value analysis with the 
heuristic based on domain types. 

V. Experiments 

We implemented our approach in the open-source verification 
framework CPAchecker, which is available online]^ under the 
Apache 2.0 license. CPAchecker already has several analyses 
implemented that can be used for program analysis with 
CEGAR and lazy abstraction. We only extended the refinement 
process to work according to Alg. (Refine+), and did 
neither change the abstract domains nor the interpolation 
engines. Our implementation is available in the source-code 
repository of CPAchecker. The tool, the benchmark programs, 
the configuration files, and the complete results are available 
on the supplementary web pagej^ 

Setup. We used the same experimental setup as in the Interna¬ 
tional Competition on Software Verification (SV-COMP’14) Q: 
machines with Intel Core i7-2600 quad-core CPUs with 
3.4 GHz, a memory limit of 15 GB, and a time limit of 15 min. 
We limited each verification run to one CPU core, because we 
are interested in the consumed CPU time and the consumed 
wall time was not important. 

Benchmark Programs. For benchmarking we used the C pro¬ 
grams of the category “DeviceDrivers64” of SV-COMP’14. 
This category contains 1 428 large programs based on real- 
world Linux-kernel device drivers with an average of 6 045 lines 
of code per program. We consider this category to be espe¬ 
cially interesting because our approach focuses on improving 
refinements in large programs (with long and complex error 
paths, and many contradicting assume operations per error 
path). Verification of device drivers is a challenging research 
topic |12| and an important application domain |[5]|21|. For 

^Results are available at http://www.rers-challenge.org/2014Isola/ 

'http://cpachecker.sosy-lab.org 

^ http ://www. sosy- lab. org/~ dbey er/cpa-ref- sel/ 
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TABLE I: Results for slice-based refinement selection 


Tasks 

DeviceDrivers64 

All 


(1 428 tasks) 

(2626 tasks) 

Configuration 

Classic 

Sliced 

Classic 

Sliced 

# Solved 

1328 

1375 

1932 

1996 

CPU time (h) 

28.4 

16.9 

171 

156 


completeness, we also report the results for the 2 626 programs 
of all categories of SV-COMP’ 14 except “Concurrency”, “Heap- 
Manipulation”, “MemorySafety”, and “Recursive”, which rely 
on features that were not supported by the used configurations 
of our tool. 




(a) on category “DeviceDrivers64” (b) on all 2 626 verification tasks 

Fig. 5: Scatter plots comparing the CPU when not using slice- 
based refinement selection (x-axis) with the CPU time when 
using slice-based refinement selection (y-axis) 


Configurations. Out of the several abstract domains that are 
supported by CPAchecker, we choose the value analysis with 
refinement and lazy abstraction E) for our experiments. This 
abstract domain tracks explicit values for each program variable, 
and in case the safety of the program depends on facts that 
cannot be handled by the value analysis, it delegates to an 
auxiliary predicate analysis, which is configured for single- 
block-encoding | fT0| . We used CPAchecker in revision 15 509 
of tag cpachecker-1.3.10-refinementSelection. 

When using slice-based refinement selection, the heuristic 
for choosing sliced prefixes (function ChooseSlicedPrefix in 
Alg.g was configured to select the interpolant sequence with 
the best score based on the domain types of the variables Q 
referenced in the interpolants, i.e., variables with a boolean 
character are favored over integer variables and loop counters. 

Results. We now compare the results of running the analysis 
with both a classic refinement algorithm (as in Alg. and 
our new refinement algorithm that is based on sliced prefixes 
(using Alg. 0. Table 1^ shows a summary of the results. The 
new approach proves to be effective, by solving a total of 1 375 
of 1 428 programs correctly in the category “DeviceDrivers64”. 
Compared to the existing approach, it solves 47 more programs 
correctly and verifies all programs that could be verified before, 
too (no regressions). At the same time, the total CPU time was 
reduced to 60 %. The reason for this vast improvement is that 
the heuristic for choosing sliced prefixes (guided by the domain 
type of the referenced variables) is especially effective for the 
highly complex and heterogeneous program code in Linux- 
kernel device drivers. On the set of all programs, slice-based 
refinement selection is effective, too. It can solve 64 more 
verification problems correctly and needs almost 10% less 
time. 

Figure shows scatter plots for comparing the CPU time of 
slice-based refinement selection versus the existing approach on 
both sets of verification tasks. Only data points for successful 
verification runs and timeouts are shown (out-of-memory runs 
are omitted). The figures show that our approach in many 
cases makes the difference between solving the verification 
task within the time limit, and not solving the verification task 
at all (such instances are those at the right border of the plot). 
This illustrates that without slice-based refinement selection 


and our heuristic for avoiding loop counters in the precision, 
the interpolants will sometimes be such that the analysis has to 
unroll long loops, which causes state-space explosion; this can 
often be avoided with the new approach. The plot also shows 
that for most of the remaining programs there is no difference 
in time. This is due to the fact that both sets also contain 
a large number of small programs, for which our approach 
does not make a difference, because the counterexamples 
are short and simple. Figure shows that for the category 
“DeviceDrivers64”, there is not a single effectiveness regression, 
i.e., all verification tasks that the classic approach can solve 
can also be solved by slice-based refinement selection — plus 
47 more. Figure shows that on the set of all programs, there 
are a few regressions where there is a timeout when using 
the new approach. These are randomly created programs that 
belong to the “ECA” subset of SV-COMP’14. All variables 
in these programs have the same domain type, and thus, our 
heuristic for choosing interpolants based on the domain types 
of variables is not effective here. For this subset, a heuristic 
specifically developed for the ECA programs of RERS’14 was 
successful. 

VI. Conclusion 

In this work we presented our novel approach of slice-based 
refinement selection, which extracts several infeasible sliced 
prefixes from one single infeasible error path. From any of 
these infeasible sliced prefixes, an independent interpolation 
problem can be derived that can be solved by a standard 
interpolation engine, and the analysis can choose from the 
resulting interpolant sequences the one thought to be best for 
the verification. Our novel approach is independent from the 
abstract domain (in particular, does not depend on an SMT 
solver) and can be combined with any analysis that is based on 
CEGAR and interpolation-based abstraction refinement, while 
previous work on guided interpolation | [22) is applicable only to 
SMT-based approaches. We experimentally demonstrated that 
the novel approach using a heuristic based on domain types 
can significantly improve the effectiveness and efficiency of 
the program analysis. We also discussed some possible further 
heuristics to select suitable interpolant sequences. 
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